-
Notifications
You must be signed in to change notification settings - Fork 8k
Fix SORT_REGULAR transitivity for mixed types #20315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6e73f5a to
7f9704c
Compare
Fixes phpGH-20262: SORT_REGULAR now uses transitive comparison for consistent sorting of mixed numeric/non-numeric strings and types. Added EG(transitive_compare_mode) flag with save/restore pattern to enforce ordering: numeric-types < numeric-strings < non-numeric. This fixes sort() and array_unique() inconsistencies with mixed types, nested arrays, and objects. Comparison operators unchanged.
7f9704c to
438b127
Compare
Empty strings must sort before numbers to match PHP 8+ semantics where '' < 5 is true. Updated compare_long_to_string(), compare_double_to_string(), and zendi_smart_strcmp() to handle empty string as a special case in transitive mode.
Without initialization, transitive_compare_mode had garbage values on Windows ZTS, causing the flag to leak into normal comparisons. Initialize to false in init_executor().
withinboredom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Some suggested additional tests that could help:
$a = ["", "0", "00", "A"];
$b = ["A", "00", "0", ""];
sort($a, SORT_REGULAR);
sort($b, SORT_REGULAR);
var_dump($a === $b); // expect true
// and expected order: ["", "0", "00", "A"]?
$a = [" 5", "+5", "-0", "0", "A"];
sort($a, SORT_REGULAR);
// expect numeric ones before "A", retaining numeric ordering: ["-0","0","+5"," 5","A"] and stable or no (+5 vs [space]5 insertion order)?
$a = ["5e2", "500", "NAN", "INF", "-INF"];
sort($a, SORT_REGULAR);
// So: ["500","5e2", "-INF","INF","NAN"]?
$a = ["0x10", "16", "0b10000"];
sort($a, SORT_REGULAR);
// expect: ["16", "0b10000", "0x10"]?
$a = [10, "3A", 5, "10", ""];
sort($a, SORT_REGULAR);
// expect: ["", 5, 10, "10", "3A"]?
$a = ["9223372036854775807", "9223372036854775808", 9223372036854775807];
sort($a, SORT_REGULAR);
// ensure consistency across LONG_MAX vs double promotionsSuggested-by: Rob Landers <[email protected]>
- Add gh20262.phpt: Bug reproduction test (replaces sort_regular_transitive*.phpt) - Add sort_variation_numeric_strings.phpt: Edge case tests for numeric strings Suggested-by: Rob Landers <[email protected]>
On x32, 9223372036854775807 becomes float instead of int.
|
Hi @jmarble. Sorry for the delayed response. One immediate concern is that this introduces an obvious inconsistency for key and value sorting. These two should remain consistent. $units = ['5', '3A'];
sort($units);
var_dump($units);
// array(2) {
// [0]=>
// string(1) "5"
// [1]=>
// string(2) "3A"
// }
$units = ['5' => 1, '3A' => 1];
ksort($units);
var_dump($units);
// array(2) {
// ["3A"]=>
// int(1)
// [5]=>
// int(1)
// }There's an obvious performance impact, given how common comparisons are (+0.4% instruction count as suggested by the Symfony Demo benchmark). Whether there's a true real-world impact remains to be seen. This is related to GH-9882. Enums are also plagued by them not being properly handled in I also wonder how this relates to @Girgias' efforts to make comparisons in PHP transitive. Maybe she can comment on that. |
|
Hi @iluuu1994 thank you for taking the time to review this so carefully. Admittedly, I got a bit too focused on resolving the issue I had run into. I'll push a commit here in a moment to resolve the discrepancy.
The CI benchmark was actually only +0.04%, so only 0.04ms on a hypothetical 100ms request scenario. I think this is acceptable for correctness. We could always try compensating with a performance improvement somewhere. I saw your email thread to internals regarding |
- Apply transitive comparison mode to ksort/krsort for consistency with sort(). - Reset transitive_compare_mode in _zend_bailout() to prevent mode leakage. - Add tests for ksort consistency and bailout scenarios.
@jmarble You're right, sorry. I mistyped. It's not completely insignificant (performance improvements are very hard to come by nowadays), but unlikely to be a blocker if we otherwise decide this is the right behavior. I do think it may be useful to consider consolidating comparison as a whole. Gina has done some work on that, as mentioned. Maybe we can solve this in a more consistent manner. If not, I don't object to your solution. The array_unique case specifically may be better served with a case that doesn't rely on sorting. Some people have suggested introducing a proper |
|
@iluuu1994 looks like we got a pretty favorable run on this last benchmark? A swing from +0.04% to -0.19% and faster across the board? Could be noise, or just more proof of negligible impact on performance.
I agree. Admittedly, I took the path of least resistance. I felt there was enough precedent for an EG flag.
Yes, a proper |
This test covers reentrancy cases with usort, uksort, sort, array_unique, and ksort.
This commit replaces manual comparisons with ZEND_THREEWAY_COMPARE to improve consistency. Additionally, it adds type-specific fast paths for integer, string, and double comparisons in array sorting.
Fixes CI error: "label followed by a declaration is a C23 extension"
|
For context, I had started thinking about what comparison semantics PHP should have and some of it is written down in one of my draft RFC: https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md However, not that comparison operators cannot be transitive, even if you just include numeric values due to In general I believe that getting rid of "ordering" for strings with the comparison operators is a better approach and have people use However, this doesn't really address |
|
@Girgias thank you for sharing your RFC draft on equality semantics -- you make a compelling case! To address your question: creating a standalone Ultimately, I felt this path of least resistance was acceptable given the amount of refactoring that would be necessary to implement a proper alternative. Perhaps this solution might at least inspire someone who might be willing to make such a refactor? In my recent commit, I found some opportunities to improve comparison performance. I noticed In real-world scenarios, I would expect no meaningful performance impact. However, the correctness bug this resolves is a demonstrable win. |
Refactors php_array_data_compare_unstable_i() to dereference before type checking and simplify control flow with if/else-if chain. This eliminates redundant mode toggling and improves branch prediction. Pass transitive_mode parameter to compare_long_to_string() and compare_double_to_string() to reduce repeated EG() calls.
Problem
array_unique()withSORT_REGULARwas missing duplicates, andsort()was producing inconsistent results depending on input order when comparing mixed types (numeric strings, non-numeric strings, integers, doubles).The root cause: SORT_REGULAR was using PHP's comparison operators (which are non-transitive by design) directly in sorting algorithms that require transitive comparisons to function correctly.
When comparing these types, the comparison created cycles:
"5" < "10"(numeric: 5 < 10)"10" < "3A"(lexicographic: '1' < '3')"5" > "3A"(lexicographic: '5' > '3') Creates a cycleThis violated the fundamental requirement of sorting algorithms: the comparison function must be transitive. This affected not only scalar strings but also nested arrays and objects with string properties.
Solution
Added
EG(transitive_compare_mode)flag tozend_executor_globalsthat enforces transitive comparison during sorting operations, with consistent ordering that matches PHP 8+ semantics: empty-strings < numeric-types < numeric-strings < non-numeric.Modified three comparison functions to check this flag:
zendi_smart_strcmp()- handles string-to-string comparisonscompare_long_to_string()- handles integer-to-string comparisonscompare_double_to_string()- handles double-to-string comparisonsThe flag is properly initialized in
init_executor()and uses save/restore pattern inphp_array_data_compare_unstable_i()to handle reentrancy correctly.Important: This fix does not change the behavior of comparison operators like
<=>, maintaining backward compatibility. The fix only affects sorting and array operations with SORT_REGULAR.Tests
Added comprehensive tests:
gh20262.phpt- Bug reproduction test covering scalars, objects, and nested arraysgh20262_bailout.phpt- Tests bailout handling to prevent mode leakagegh20262_reentrancy.phpt- Tests reentrancy handling in comparisonssort/sort_variation_numeric_strings.phpt- Edge case tests for numeric string handlingsort/ksort_variation_numeric_strings.phpt- Tests transitive mode in key sortingAll existing array sorting tests pass without modification!
Fixes #20262